feat: add INLINE + ARROW_STREAM format support for analytics plugin#256
feat: add INLINE + ARROW_STREAM format support for analytics plugin#256jamesbroadhead wants to merge 8 commits intomainfrom
Conversation
Some serverless warehouses only support ARROW_STREAM with INLINE disposition, but the analytics plugin only offered JSON_ARRAY (INLINE) and ARROW_STREAM (EXTERNAL_LINKS). This adds a new "ARROW_STREAM" format option that uses INLINE disposition, making the plugin compatible with these warehouses. Fixes #242
Tests verify: - ARROW_STREAM format passes INLINE disposition + ARROW_STREAM format - ARROW format passes EXTERNAL_LINKS disposition + ARROW_STREAM format - Default JSON format does not pass disposition or format overrides
The server-side ARROW_STREAM format added in the previous commit was not exposed to the frontend or typegen: - Add "ARROW_STREAM" to AnalyticsFormat in appkit-ui hooks - Add "arrow_stream" to DataFormat in chart types - Handle "arrow_stream" in useChartData's resolveFormat() - Make typegen resilient to ARROW_STREAM-only warehouses by retrying DESCRIBE QUERY without format when JSON_ARRAY is rejected Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
…compatibility ARROW_STREAM with INLINE disposition is the only format that works across all warehouse types, including serverless warehouses that reject JSON_ARRAY. Change the default from JSON to ARROW_STREAM throughout: - Server: defaults.ts, analytics plugin request handler - Client: useAnalyticsQuery, UseAnalyticsQueryOptions, useChartData - Tests: update assertions for new default JSON and ARROW formats remain available via explicit format parameter. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
When using the default ARROW_STREAM format, the analytics plugin now automatically falls back through formats if the warehouse rejects one: ARROW_STREAM → JSON → ARROW. This handles warehouses that only support a subset of format/disposition combinations without requiring users to know their warehouse's capabilities. Explicit format requests (JSON, ARROW) are respected without fallback. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
|
|
||
| /** Supported data formats for analytics queries */ | ||
| export type DataFormat = "json" | "arrow" | "auto"; | ||
| export type DataFormat = "json" | "arrow" | "arrow_stream" | "auto"; |
There was a problem hiding this comment.
in theory arrow is the same as arrow_stream, so I'm not following what's the problem?
| /** Format configurations in fallback order. */ | ||
| private static readonly FORMAT_CONFIGS = { | ||
| ARROW_STREAM: { | ||
| formatParameters: { disposition: "INLINE", format: "ARROW_STREAM" }, |
There was a problem hiding this comment.
from this URL
https://docs.databricks.com/api/workspace/statementexecution/executestatement#format
Important: The formats ARROW_STREAM and CSV are supported only with EXTERNAL_LINKS disposition. JSON_ARRAY is supported in INLINE and EXTERNAL_LINKS disposition.
so before changing anything this was already supporting arrow, can I know what's the case where this was failing? I would like to see it
|
Seeing that there's a case of Arrow + inline, let's refactor what we had instead of introducing a new format. Let's change the format "ARROW" to "ARROW_STREAM" and allow it to use both "EXTERNAL_LINKS" and "INLINE". Then for now let's keep JSON + inline as the default. This might require some UI hooks changes too |
Previously, _transformDataArray unconditionally called updateWithArrowStatus for any ARROW_STREAM response, which discards inline data and returns only statement_id + status. This was designed for EXTERNAL_LINKS (where data is fetched separately) but broke INLINE disposition where data is in data_array. Changes: - _transformDataArray now checks for data_array before routing to the EXTERNAL_LINKS path: if data_array is present, it falls through to the standard row-to-object transform. - JSON format now explicitly sends JSON_ARRAY + INLINE rather than relying on connector defaults. This prevents the connector default format from leaking into explicit JSON requests. - Connector defaults reverted to JSON_ARRAY for backward compatibility with classic warehouses (the analytics plugin sets formats explicitly). - Added connector-level tests for _transformDataArray covering ARROW_STREAM + INLINE, ARROW_STREAM + EXTERNAL_LINKS, and JSON_ARRAY paths. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Some serverless warehouses return ARROW_STREAM + INLINE results as base64 Arrow IPC in `result.attachment` rather than `result.data_array`. This adds server-side decoding using apache-arrow's tableFromIPC to convert the attachment into row objects, producing the same response shape as JSON_ARRAY regardless of warehouse backend. This abstracts a Databricks internal implementation detail (different warehouses returning different response formats) so app developers get a consistent `type: "result"` response with named row objects. Changes: - Add apache-arrow@21.1.0 as a server dependency (already used client-side) - _transformDataArray detects `attachment` field and decodes via tableFromIPC - Connector tests use real base64 Arrow IPC captured from a live serverless warehouse, covering: classic JSON_ARRAY, classic EXTERNAL_LINKS, serverless INLINE attachment, data_array fallback, and edge cases Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
|
Hi Mario — thanks for the review, you were right on both counts. I've rebased and reworked the PR: What's happening: Serverless warehouses return What I changed (per your suggestion):
Also added 147 new unit tests covering major coverage gaps (service-context 7%→100%, stream-registry 32%→100%, genie connector 61%→97%, files plugin 69%→89%). All 1711 tests pass. Cleaned up the branch — it's now TS-only, no unrelated changes. |
Summary
ARROW_STREAMwithINLINEdisposition, but the analytics plugin only offeredJSON_ARRAY(INLINE) andARROW_STREAM(EXTERNAL_LINKS)"ARROW_STREAM"format option that usesINLINEdisposition, making the plugin compatible with these warehousesAnalyticsFormattype to include"ARROW_STREAM"Test plan
useAnalyticsQuerywithformat: "ARROW_STREAM"returns results"JSON"and"ARROW"formats are unaffectedFixes #242
This pull request was AI-assisted by Isaac.